Research Question¶

How do funding disparities suggest an underrepresentation of Black and Hispanic youth in traditional U.S public schools in high-cost sports?

Problem Statement¶

Sports specialization involves year-round training and competition, and requires costly investments towards participation, travel, and equipment fees, which creates significant finanicial barriers for youth from lower socioeconomic backgrounds. Aside from this, public school funding disparities can limit access to appropriate facilities, personnel, or physical education, which could further hinder sports participation opportunities for youth in lower SES communities. These disparities can contribute to underrepresentation of Black or Hispanic youth in sports with high financial barriers -- hockey, gymnastics, tennis, etc., while sports such as track and field are less expensive, and therefore more accessible.

Potential Subtopics¶

  • Correlation between public school funding and facility quality
  • Connection between SES and physical activity/education

Data Definition¶

Public School Characteristics 2022-23

Last Updated: October 21, 2024

https://catalog.data.gov/dataset/public-school-characteristics-2022-23-451db

The National Center for Education Statistics (NCES) gathers demographic and geographic data about U.S public schools and factors such as enrollment and Title I status. Further information consists of the percentage of students with free or reduced lunch eligibility. By researching both this dataset and the YRBSS, researchers could analyze patterns between students or schools with a lower SES and the rates of physical activity rates.

Additional Datasets of Interest¶

Nutrition, Physical Activity, and Obesity - Youth Risk Behavior Surveillance System

Last Updated: February 4, 2025

https://catalog.data.gov/dataset/nutrition-physical-activity-and-obesity-youth-risk-behavior-surveillance-system

Conducted by the Centers for Disease Control and Prevention (CDC), the Youth Risk Behavior Surveillance System (YRBSS) monitors health behaviors in middle and high school students nationwide. It collects data regarding physical activity and nutrition, along with geographic and socioeconomic factors. By collecting this data, it could be used to further research on the impact socioeconomic factors have on health behaviors.

Data Collection¶

In [1]:
import numpy as np                
import pandas as pd              
import matplotlib.pyplot as plt   
import seaborn as sns               

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

import warnings
warnings.filterwarnings('ignore')

Read the Data¶

In [2]:
path = pd.read_csv('Public_School_Characteristics_2022-23.csv')
psChar_23 = pd.DataFrame(path)
In [3]:
psChar_23.head(7)
Out[3]:
X Y OBJECTID NCESSCH SURVYEAR STABR LEAID ST_LEAID LEA_NAME SCH_NAME LSTREET1 LSTREET2 LCITY LSTATE LZIP LZIP4 PHONE CHARTER_TEXT VIRTUAL GSLO GSHI SCHOOL_LEVEL STATUS SCHOOL_TYPE_TEXT SY_STATUS_TEXT ULOCALE NMCNTY TOTFRL FRELCH REDLCH DIRECTCERT PK KG G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 UG AE TOTMENROL TOTFENROL TOTAL MEMBER FTE STUTERATIO AMALM AMALF AM ASALM ASALF AS BLALM BLALF BL HPALM HPALF HP HIALM HIALF HI TRALM TRALF TR WHALM WHALF WH LATCOD LONCOD
0 -86.206200 34.26020 1 10000500870 2022-2023 AL 100005 AL-101 Albertville City Albertville Middle School 600 E Alabama Ave NaN Albertville AL 35950 (256)878-2341 No Not Virtual 07 08 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 697 654 43 587 NaN NaN NaN NaN NaN NaN NaN NaN 440.0 450.0 NaN NaN NaN NaN NaN NaN NaN 459.0 431.0 890.0 890.0 45.000000 19.78 4.0 1.0 5.0 4.0 2.0 6.0 15.0 14.0 29.0 0.0 1.0 1.0 251.0 251.0 502.0 17.0 15.0 32.0 168.0 147.0 315.0 34.26020 -86.206200
1 -86.204900 34.26220 2 10000500871 2022-2023 AL 100005 AL-101 Albertville City Albertville High School 402 E McCord Ave NaN Albertville AL 35950 2322 (256)894-5000 No Not Virtual 09 12 High 1 Regular School Currently operational 32-Town: Distant Marshall County 1254 1178 76 1059 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 493.0 442.0 390.0 387.0 NaN NaN NaN 868.0 844.0 1712.0 1712.0 85.199997 20.09 0.0 2.0 2.0 4.0 5.0 9.0 23.0 34.0 57.0 0.0 0.0 0.0 490.0 468.0 958.0 26.0 19.0 45.0 325.0 316.0 641.0 34.26220 -86.204900
2 -86.220100 34.27330 3 10000500879 2022-2023 AL 100005 AL-101 Albertville City Albertville Intermediate School 901 W McKinney Ave NaN Albertville AL 35950 1300 (256)878-7698 No Not Virtual 05 06 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 718 665 53 570 NaN NaN NaN NaN NaN NaN 412.0 462.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 451.0 423.0 874.0 874.0 43.000000 20.33 1.0 4.0 5.0 4.0 0.0 4.0 22.0 28.0 50.0 0.0 0.0 0.0 263.0 241.0 504.0 7.0 6.0 13.0 154.0 144.0 298.0 34.27330 -86.220100
3 -86.221806 34.25270 4 10000500889 2022-2023 AL 100005 AL-101 Albertville City Albertville Elementary School 145 West End Drive NaN Albertville AL 35950 (256)894-4822 No Not Virtual 03 04 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 723 680 43 583 NaN NaN NaN NaN 430.0 444.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 463.0 411.0 874.0 874.0 43.000000 20.33 0.0 4.0 4.0 1.0 3.0 4.0 22.0 16.0 38.0 0.0 0.0 0.0 261.0 236.0 497.0 11.0 16.0 27.0 168.0 136.0 304.0 34.25270 -86.221806
4 -86.193300 34.28980 5 10000501616 2022-2023 AL 100005 AL-101 Albertville City Albertville Kindergarten and PreK 257 Country Club Rd NaN Albertville AL 35951 3927 (256)878-7922 No Not Virtual PK KG Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 392 367 25 240 133.0 473.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 304.0 302.0 606.0 606.0 26.000000 23.31 1.0 3.0 4.0 2.0 0.0 2.0 26.0 23.0 49.0 0.0 0.0 0.0 167.0 152.0 319.0 4.0 4.0 8.0 104.0 120.0 224.0 34.28980 -86.193300
5 -86.221800 34.25330 6 10000502150 2022-2023 AL 100005 AL-101 Albertville City Albertville Primary School 1100 Horton Rd NaN Albertville AL 35950 2532 (256)878-6611 No Not Virtual 01 02 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 779 726 53 617 0.0 NaN 427.0 517.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 498.0 446.0 944.0 944.0 61.000000 15.48 9.0 1.0 10.0 3.0 0.0 3.0 24.0 21.0 45.0 0.0 1.0 1.0 290.0 256.0 546.0 9.0 10.0 19.0 163.0 157.0 320.0 34.25330 -86.221800
6 -86.254153 34.53375 7 10000600193 2022-2023 AL 100006 AL-048 Marshall County Kate Duncan Smith DAR Middle 6077 Main St NaN Grant AL 35747 (256)728-5950 No Not Virtual 05 08 Middle 1 Regular School Currently operational 42-Rural: Distant Marshall County 151 123 28 194 NaN NaN NaN NaN NaN NaN 95.0 97.0 86.0 86.0 NaN NaN NaN NaN NaN NaN NaN 192.0 172.0 364.0 364.0 22.030001 16.52 1.0 3.0 4.0 0.0 0.0 0.0 2.0 0.0 2.0 0.0 0.0 0.0 6.0 8.0 14.0 5.0 9.0 14.0 178.0 152.0 330.0 34.53375 -86.254153
In [4]:
psChar_23.tail(7)
Out[4]:
X Y OBJECTID NCESSCH SURVYEAR STABR LEAID ST_LEAID LEA_NAME SCH_NAME LSTREET1 LSTREET2 LCITY LSTATE LZIP LZIP4 PHONE CHARTER_TEXT VIRTUAL GSLO GSHI SCHOOL_LEVEL STATUS SCHOOL_TYPE_TEXT SY_STATUS_TEXT ULOCALE NMCNTY TOTFRL FRELCH REDLCH DIRECTCERT PK KG G01 G02 G03 G04 G05 G06 G07 G08 G09 G10 G11 G12 G13 UG AE TOTMENROL TOTFENROL TOTAL MEMBER FTE STUTERATIO AMALM AMALF AM ASALM ASALF AS BLALM BLALF BL HPALM HPALF HP HIALM HIALF HI TRALM TRALF TR WHALM WHALF WH LATCOD LONCOD
101383 -64.932456 18.352146 101384 780003000020 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District JOSEPH SIBILLY ELEMENTARY SCHOOL 14 15 16 ESTATE ELIZABETH NaN Saint Thomas VI 802 (340)774-7001 N Not Virtual PK 06 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 228 228 0 -1 19.0 25.0 25.0 25.0 31.0 34.0 34.0 38.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 121.0 110.0 231.0 231.0 16.0 14.44 0.0 0.0 0.0 2.0 2.0 4.0 99.0 93.0 192.0 0.0 0.0 0.0 8.0 5.0 13.0 2.0 1.0 3.0 10.0 9.0 19.0 18.352146 -64.932456
101384 -64.793916 18.330464 101385 780003000022 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District JULIUS E SPRAUVE 14 18 ESTATE ENIGHED NaN Saint John VI 831 (340)776-6336 N Not Virtual PK 08 Elementary 1 Regular School Currently operational 33-Town: Remote St. John Island 199 199 0 -1 8.0 21.0 16.0 21.0 14.0 24.0 20.0 26.0 27.0 25.0 NaN NaN NaN NaN NaN NaN NaN 103.0 99.0 202.0 202.0 20.0 10.10 1.0 0.0 1.0 0.0 0.0 0.0 79.0 68.0 147.0 0.0 0.0 0.0 22.0 29.0 51.0 0.0 0.0 0.0 1.0 2.0 3.0 18.330464 -64.793916
101385 -64.917602 18.341950 101386 780003000024 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District LOCKHART ELEMENTARY SCHOOL 41 ESTATE THOMAS NaN Saint Thomas VI 802 (340)775-0820 N Not Virtual KG 03 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 295 295 0 -1 NaN 77.0 75.0 69.0 77.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 171.0 127.0 298.0 298.0 18.0 16.56 0.0 0.0 0.0 4.0 3.0 7.0 132.0 92.0 224.0 0.0 0.0 0.0 33.0 30.0 63.0 1.0 2.0 3.0 1.0 0.0 1.0 18.341950 -64.917602
101386 -64.952483 18.338742 101387 780003000026 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District ULLA F MULLER ELEMENTARY SCHOOL 7B ESTATE CONTANT NaN Saint Thomas VI 802 (340)774-0059 N Not Virtual KG 06 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 417 417 0 -1 NaN 52.0 53.0 51.0 47.0 70.0 79.0 68.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 200.0 220.0 420.0 420.0 28.0 15.00 0.0 2.0 2.0 2.0 4.0 6.0 167.0 182.0 349.0 0.0 0.0 0.0 27.0 27.0 54.0 2.0 0.0 2.0 2.0 5.0 7.0 18.338742 -64.952483
101387 -64.899024 18.354782 101388 780003000027 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District YVONNE BOWSKY ELEMENTARY SCHOOL 15B and 16 ESTATE MANDAHL NaN Saint Thomas VI 802 (340)775-3220 N Not Virtual PK 05 Elementary 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 425 425 0 -1 22.0 62.0 67.0 66.0 75.0 68.0 68.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 252.0 176.0 428.0 428.0 34.0 12.59 1.0 1.0 2.0 5.0 4.0 9.0 201.0 144.0 345.0 0.0 0.0 0.0 37.0 22.0 59.0 0.0 1.0 1.0 8.0 4.0 12.0 18.354782 -64.899024
101388 -64.945940 18.336658 101389 780003000033 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District CANCRYN JUNIOR HIGH SCHOOL 1 CROWN BAY NaN Saint Thomas VI 804 (340)774-4540 N Not Virtual 04 08 Middle 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 683 683 0 -1 NaN NaN NaN NaN NaN 77.0 119.0 96.0 189.0 205.0 NaN NaN NaN NaN NaN NaN NaN 361.0 325.0 686.0 686.0 62.0 11.06 0.0 0.0 0.0 2.0 2.0 4.0 279.0 250.0 529.0 0.0 0.0 0.0 74.0 62.0 136.0 0.0 1.0 1.0 6.0 10.0 16.0 18.336658 -64.945940
101389 -64.890311 18.318230 101390 780003000034 2022-2023 VI 7800030 VI-001 Saint Thomas - Saint John School District BERTHA BOSCHULTE JUNIOR HIGH 9 1 and 12A BOVONI NaN Saint Thomas VI 802 (340)775-4222 N Not Virtual 06 08 Middle 1 Regular School Currently operational 33-Town: Remote St. Thomas Island 504 504 0 -1 NaN NaN NaN NaN NaN NaN NaN 145.0 169.0 193.0 NaN NaN NaN NaN NaN NaN NaN 279.0 228.0 507.0 507.0 49.0 10.35 0.0 0.0 0.0 2.0 1.0 3.0 250.0 204.0 454.0 0.0 0.0 0.0 27.0 21.0 48.0 0.0 0.0 0.0 0.0 2.0 2.0 18.318230 -64.890311
In [5]:
psChar_23.shape
Out[5]:
(101390, 77)
  • The dataframe has 101,390 rows of data.
  • The dataframe has 77 columns or features.
  • There are 6,894,520 total datapoints observed in the dataset.
In [6]:
psChar_23.info(show_counts=True, verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101390 entries, 0 to 101389
Data columns (total 77 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   X                 101390 non-null  float64
 1   Y                 101390 non-null  float64
 2   OBJECTID          101390 non-null  int64  
 3   NCESSCH           101390 non-null  int64  
 4   SURVYEAR          101390 non-null  object 
 5   STABR             101390 non-null  object 
 6   LEAID             101390 non-null  int64  
 7   ST_LEAID          101390 non-null  object 
 8   LEA_NAME          101390 non-null  object 
 9   SCH_NAME          101390 non-null  object 
 10  LSTREET1          101389 non-null  object 
 11  LSTREET2          572 non-null     object 
 12  LCITY             101390 non-null  object 
 13  LSTATE            101390 non-null  object 
 14  LZIP              101390 non-null  int64  
 15  LZIP4             101390 non-null  object 
 16  PHONE             101390 non-null  object 
 17  CHARTER_TEXT      101390 non-null  object 
 18  VIRTUAL           101390 non-null  object 
 19  GSLO              101390 non-null  object 
 20  GSHI              101390 non-null  object 
 21  SCHOOL_LEVEL      101390 non-null  object 
 22  STATUS            101390 non-null  int64  
 23  SCHOOL_TYPE_TEXT  101390 non-null  object 
 24  SY_STATUS_TEXT    101390 non-null  object 
 25  ULOCALE           101390 non-null  object 
 26  NMCNTY            101390 non-null  object 
 27  TOTFRL            101390 non-null  int64  
 28  FRELCH            101390 non-null  int64  
 29  REDLCH            101390 non-null  int64  
 30  DIRECTCERT        101390 non-null  int64  
 31  PK                32392 non-null   float64
 32  KG                54061 non-null   float64
 33  G01               54412 non-null   float64
 34  G02               54469 non-null   float64
 35  G03               54459 non-null   float64
 36  G04               54258 non-null   float64
 37  G05               53014 non-null   float64
 38  G06               38023 non-null   float64
 39  G07               33224 non-null   float64
 40  G08               33492 non-null   float64
 41  G09               28101 non-null   float64
 42  G10               27889 non-null   float64
 43  G11               27888 non-null   float64
 44  G12               27816 non-null   float64
 45  G13               133 non-null     float64
 46  UG                7889 non-null    float64
 47  AE                183 non-null     float64
 48  TOTMENROL         98910 non-null   float64
 49  TOTFENROL         98910 non-null   float64
 50  TOTAL             99719 non-null   float64
 51  MEMBER            99719 non-null   float64
 52  FTE               97537 non-null   float64
 53  STUTERATIO        99576 non-null   float64
 54  AMALM             98809 non-null   float64
 55  AMALF             98811 non-null   float64
 56  AM                98857 non-null   float64
 57  ASALM             98898 non-null   float64
 58  ASALF             98900 non-null   float64
 59  AS                98906 non-null   float64
 60  BLALM             98896 non-null   float64
 61  BLALF             98893 non-null   float64
 62  BL                98903 non-null   float64
 63  HPALM             98782 non-null   float64
 64  HPALF             98783 non-null   float64
 65  HP                98829 non-null   float64
 66  HIALM             98909 non-null   float64
 67  HIALF             98910 non-null   float64
 68  HI                98910 non-null   float64
 69  TRALM             98903 non-null   float64
 70  TRALF             98905 non-null   float64
 71  TR                98906 non-null   float64
 72  WHALM             98909 non-null   float64
 73  WHALF             98909 non-null   float64
 74  WH                98910 non-null   float64
 75  LATCOD            101390 non-null  float64
 76  LONCOD            101390 non-null  float64
dtypes: float64(48), int64(9), object(20)
memory usage: 59.6+ MB
In [7]:
ps23Cols = psChar_23.columns
ps23Cols
Out[7]:
Index(['X', 'Y', 'OBJECTID', 'NCESSCH', 'SURVYEAR', 'STABR', 'LEAID',
       'ST_LEAID', 'LEA_NAME', 'SCH_NAME', 'LSTREET1', 'LSTREET2', 'LCITY',
       'LSTATE', 'LZIP', 'LZIP4', 'PHONE', 'CHARTER_TEXT', 'VIRTUAL', 'GSLO',
       'GSHI', 'SCHOOL_LEVEL', 'STATUS', 'SCHOOL_TYPE_TEXT', 'SY_STATUS_TEXT',
       'ULOCALE', 'NMCNTY', 'TOTFRL', 'FRELCH', 'REDLCH', 'DIRECTCERT', 'PK',
       'KG', 'G01', 'G02', 'G03', 'G04', 'G05', 'G06', 'G07', 'G08', 'G09',
       'G10', 'G11', 'G12', 'G13', 'UG', 'AE', 'TOTMENROL', 'TOTFENROL',
       'TOTAL', 'MEMBER', 'FTE', 'STUTERATIO', 'AMALM', 'AMALF', 'AM', 'ASALM',
       'ASALF', 'AS', 'BLALM', 'BLALF', 'BL', 'HPALM', 'HPALF', 'HP', 'HIALM',
       'HIALF', 'HI', 'TRALM', 'TRALF', 'TR', 'WHALM', 'WHALF', 'WH', 'LATCOD',
       'LONCOD'],
      dtype='object')
In [8]:
psChar_23 = psChar_23.rename(columns = {'OBJECTID':'ObjectID','NCESSCH':'NCESID','SURVYEAR':'SurveyYear', 
                                        'STABR':'StateABR','LEA_NAME':'LEAname','SCH_NAME':'SchoolName', 
                                        'LSTREET1':'Street1','LSTREET2':'Street2','LCITY':'City',
                                        'LSTATE':'State','LZIP':'Zip','LZIP4':'Zip4', 
                                        'PHONE':'Phone', 'CHARTER_TEXT':'Charter', 'VIRTUAL':'Virtual', 
                                        'GSLO':'LowestGrade','GSHI':'HighestGrade', 
                                        'SCHOOL_LEVEL':'SchoolLevel', 
                                        'STATUS':'Status', 'SCHOOL_TYPE_TEXT':'SchoolType', 
                                        'SY_STATUS_TEXT':'Status_Text',
                                        'ULOCALE':'Locale', 'NMCNTY':'County', 
                                        'TOTFRL':'TotalFreeLunch', 
                                        'FRELCH':'FreeLunch', 'REDLCH':'ReducedLunch', 
                                        'DIRECTCERT':'MealProgramCertified', 'PK':'PreK',
                                        'KG':'Kindergarten', 'G01':'Grade1', 'G02':'Grade2', 
                                        'G03':'Grade3', 'G04':'Grade4', 'G05':'Grade5', 
                                        'G06':'Grade6', 'G07':'Grade7', 'G08':'Grade8', 
                                        'G09':'Grade9','G10':'Grade10', 'G11':'Grade11', 
                                        'G12':'Grade12','G13':'Grade13', 'UG':'Ungraded', 
                                        'AE':'AdultEd', 'TOTMENROL':'TotMaleEnrollment', 
                                        'TOTFENROL':'TotFemaleEnrollment','TOTAL':'TotalEnrollment', 
                                        'MEMBER':'Member', 'FTE':'StaffFTE', 'STUTERATIO':'StudentTeacherRatio', 
                                        'AMALM':'AIANMale','AMALF':'AIANFem', 'AM':'AIANTotal', 
                                        'ASALM':'AsianMale', 'ASALF':'AsianFemale', 'AS':'AsianTotal', 
                                        'BLALM':'BlackMale','BLALF':'BlackFemale', 'BL':'BlackTotal', 
                                        'HPALM':'HPIMale', 'HPALF':'HPIFemale', 'HP':'HPITotal', 
                                        'HIALM':'HispanicMale','HIALF':'HispanicFemale', 'HI':'HispanicTotal', 
                                        'TRALM':'TRMale', 'TRALF':'TRFemale', 'TR':'TRTotal', 
                                        'WHALM':'WhiteMale','WHALF':'WhiteFemale', 'WH':'WhiteTotal', 
                                        'LATCOD':'Latitude','LONCOD':'Longitude'})

ps23Cols = psChar_23.columns
psChar_23.head()
Out[8]:
X Y ObjectID NCESID SurveyYear StateABR LEAID ST_LEAID LEAname SchoolName Street1 Street2 City State Zip Zip4 Phone Charter Virtual LowestGrade HighestGrade SchoolLevel Status SchoolType Status_Text Locale County TotalFreeLunch FreeLunch ReducedLunch MealProgramCertified PreK Kindergarten Grade1 Grade2 Grade3 Grade4 Grade5 Grade6 Grade7 Grade8 Grade9 Grade10 Grade11 Grade12 Grade13 Ungraded AdultEd TotMaleEnrollment TotFemaleEnrollment TotalEnrollment Member StaffFTE StudentTeacherRatio AIANMale AIANFem AIANTotal AsianMale AsianFemale AsianTotal BlackMale BlackFemale BlackTotal HPIMale HPIFemale HPITotal HispanicMale HispanicFemale HispanicTotal TRMale TRFemale TRTotal WhiteMale WhiteFemale WhiteTotal Latitude Longitude
0 -86.206200 34.2602 1 10000500870 2022-2023 AL 100005 AL-101 Albertville City Albertville Middle School 600 E Alabama Ave NaN Albertville AL 35950 (256)878-2341 No Not Virtual 07 08 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 697 654 43 587 NaN NaN NaN NaN NaN NaN NaN NaN 440.0 450.0 NaN NaN NaN NaN NaN NaN NaN 459.0 431.0 890.0 890.0 45.000000 19.78 4.0 1.0 5.0 4.0 2.0 6.0 15.0 14.0 29.0 0.0 1.0 1.0 251.0 251.0 502.0 17.0 15.0 32.0 168.0 147.0 315.0 34.2602 -86.206200
1 -86.204900 34.2622 2 10000500871 2022-2023 AL 100005 AL-101 Albertville City Albertville High School 402 E McCord Ave NaN Albertville AL 35950 2322 (256)894-5000 No Not Virtual 09 12 High 1 Regular School Currently operational 32-Town: Distant Marshall County 1254 1178 76 1059 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 493.0 442.0 390.0 387.0 NaN NaN NaN 868.0 844.0 1712.0 1712.0 85.199997 20.09 0.0 2.0 2.0 4.0 5.0 9.0 23.0 34.0 57.0 0.0 0.0 0.0 490.0 468.0 958.0 26.0 19.0 45.0 325.0 316.0 641.0 34.2622 -86.204900
2 -86.220100 34.2733 3 10000500879 2022-2023 AL 100005 AL-101 Albertville City Albertville Intermediate School 901 W McKinney Ave NaN Albertville AL 35950 1300 (256)878-7698 No Not Virtual 05 06 Middle 1 Regular School Currently operational 32-Town: Distant Marshall County 718 665 53 570 NaN NaN NaN NaN NaN NaN 412.0 462.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 451.0 423.0 874.0 874.0 43.000000 20.33 1.0 4.0 5.0 4.0 0.0 4.0 22.0 28.0 50.0 0.0 0.0 0.0 263.0 241.0 504.0 7.0 6.0 13.0 154.0 144.0 298.0 34.2733 -86.220100
3 -86.221806 34.2527 4 10000500889 2022-2023 AL 100005 AL-101 Albertville City Albertville Elementary School 145 West End Drive NaN Albertville AL 35950 (256)894-4822 No Not Virtual 03 04 Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 723 680 43 583 NaN NaN NaN NaN 430.0 444.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 463.0 411.0 874.0 874.0 43.000000 20.33 0.0 4.0 4.0 1.0 3.0 4.0 22.0 16.0 38.0 0.0 0.0 0.0 261.0 236.0 497.0 11.0 16.0 27.0 168.0 136.0 304.0 34.2527 -86.221806
4 -86.193300 34.2898 5 10000501616 2022-2023 AL 100005 AL-101 Albertville City Albertville Kindergarten and PreK 257 Country Club Rd NaN Albertville AL 35951 3927 (256)878-7922 No Not Virtual PK KG Elementary 1 Regular School Currently operational 32-Town: Distant Marshall County 392 367 25 240 133.0 473.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 304.0 302.0 606.0 606.0 26.000000 23.31 1.0 3.0 4.0 2.0 0.0 2.0 26.0 23.0 49.0 0.0 0.0 0.0 167.0 152.0 319.0 4.0 4.0 8.0 104.0 120.0 224.0 34.2898 -86.193300
In [9]:
psChar_23.isnull().sum()
Out[9]:
X                            0
Y                            0
ObjectID                     0
NCESID                       0
SurveyYear                   0
StateABR                     0
LEAID                        0
ST_LEAID                     0
LEAname                      0
SchoolName                   0
Street1                      1
Street2                 100818
City                         0
State                        0
Zip                          0
Zip4                         0
Phone                        0
Charter                      0
Virtual                      0
LowestGrade                  0
HighestGrade                 0
SchoolLevel                  0
Status                       0
SchoolType                   0
Status_Text                  0
Locale                       0
County                       0
TotalFreeLunch               0
FreeLunch                    0
ReducedLunch                 0
MealProgramCertified         0
PreK                     68998
Kindergarten             47329
Grade1                   46978
Grade2                   46921
Grade3                   46931
Grade4                   47132
Grade5                   48376
Grade6                   63367
Grade7                   68166
Grade8                   67898
Grade9                   73289
Grade10                  73501
Grade11                  73502
Grade12                  73574
Grade13                 101257
Ungraded                 93501
AdultEd                 101207
TotMaleEnrollment         2480
TotFemaleEnrollment       2480
TotalEnrollment           1671
Member                    1671
StaffFTE                  3853
StudentTeacherRatio       1814
AIANMale                  2581
AIANFem                   2579
AIANTotal                 2533
AsianMale                 2492
AsianFemale               2490
AsianTotal                2484
BlackMale                 2494
BlackFemale               2497
BlackTotal                2487
HPIMale                   2608
HPIFemale                 2607
HPITotal                  2561
HispanicMale              2481
HispanicFemale            2480
HispanicTotal             2480
TRMale                    2487
TRFemale                  2485
TRTotal                   2484
WhiteMale                 2481
WhiteFemale               2481
WhiteTotal                2480
Latitude                     0
Longitude                    0
dtype: int64
In [10]:
def missing(DataFrame):
    print('Percentage of missing values in the dataset:\n',
          round((DataFrame.isnull().sum() *100/len(DataFrame)), 2).sort_values(ascending=False))

missing(psChar_23)
Percentage of missing values in the dataset:
 Grade13                 99.87
AdultEd                 99.82
Street2                 99.44
Ungraded                92.22
Grade12                 72.57
Grade10                 72.49
Grade11                 72.49
Grade9                  72.28
PreK                    68.05
Grade7                  67.23
Grade8                  66.97
Grade6                  62.50
Grade5                  47.71
Kindergarten            46.68
Grade4                  46.49
Grade1                  46.33
Grade3                  46.29
Grade2                  46.28
StaffFTE                 3.80
HPIFemale                2.57
HPIMale                  2.57
AIANMale                 2.55
AIANFem                  2.54
HPITotal                 2.53
AIANTotal                2.50
AsianMale                2.46
BlackMale                2.46
AsianFemale              2.46
BlackFemale              2.46
WhiteFemale              2.45
WhiteTotal               2.45
TRTotal                  2.45
AsianTotal               2.45
BlackTotal               2.45
HispanicMale             2.45
HispanicFemale           2.45
HispanicTotal            2.45
WhiteMale                2.45
TotFemaleEnrollment      2.45
TRFemale                 2.45
TRMale                   2.45
TotMaleEnrollment        2.45
StudentTeacherRatio      1.79
TotalEnrollment          1.65
Member                   1.65
City                     0.00
Street1                  0.00
SchoolName               0.00
LEAname                  0.00
LEAID                    0.00
ST_LEAID                 0.00
StateABR                 0.00
SurveyYear               0.00
X                        0.00
NCESID                   0.00
ObjectID                 0.00
Y                        0.00
ReducedLunch             0.00
MealProgramCertified     0.00
TotalFreeLunch           0.00
FreeLunch                0.00
Zip                      0.00
State                    0.00
Zip4                     0.00
Phone                    0.00
Charter                  0.00
Virtual                  0.00
LowestGrade              0.00
HighestGrade             0.00
SchoolLevel              0.00
Status                   0.00
SchoolType               0.00
Status_Text              0.00
Locale                   0.00
County                   0.00
Latitude                 0.00
Longitude                0.00
dtype: float64

Observations¶

A total of eighteen columns have missing value percentages above forty-five percent. For the 'Grade' columns, this could be explained because this dataset includes schools at various education levels, meaning some schools might not offer certain grade levels. Furthermore, there are many missing values specifically for the columns regarding free/reduced lunch and the student to teacher ratio. As indicated in the description of this dataset online, these missing values are represented by a number of indicators: -1 indicates that data is missing, -2 or N indicates that data is not applicable, and -9 indicates that data did not meet NCES data quality standards. Given this information, I would drop the AdultEd and Grade13 columns, as this research is focused only on youth sports participation in traditional public schools. I would also drop columns 'Phone', 'LEAName', 'LEADID', 'ST_LEAID', 'SurveyYear', 'StaffFTE', 'Member', and 'NCESID', as they are not necessary for analysis. I also plan to remove the columns with negative values.

In [11]:
dropCols = ['AdultEd','Phone','LEAname','LEAID','ST_LEAID','SurveyYear','StaffFTE','Member','NCESID','Grade13']

psChar_23 = psChar_23.drop(columns=dropCols)
psChar_23 

psChar_23.isnull().sum()
Out[11]:
X                            0
Y                            0
ObjectID                     0
StateABR                     0
SchoolName                   0
Street1                      1
Street2                 100818
City                         0
State                        0
Zip                          0
Zip4                         0
Charter                      0
Virtual                      0
LowestGrade                  0
HighestGrade                 0
SchoolLevel                  0
Status                       0
SchoolType                   0
Status_Text                  0
Locale                       0
County                       0
TotalFreeLunch               0
FreeLunch                    0
ReducedLunch                 0
MealProgramCertified         0
PreK                     68998
Kindergarten             47329
Grade1                   46978
Grade2                   46921
Grade3                   46931
Grade4                   47132
Grade5                   48376
Grade6                   63367
Grade7                   68166
Grade8                   67898
Grade9                   73289
Grade10                  73501
Grade11                  73502
Grade12                  73574
Ungraded                 93501
TotMaleEnrollment         2480
TotFemaleEnrollment       2480
TotalEnrollment           1671
StudentTeacherRatio       1814
AIANMale                  2581
AIANFem                   2579
AIANTotal                 2533
AsianMale                 2492
AsianFemale               2490
AsianTotal                2484
BlackMale                 2494
BlackFemale               2497
BlackTotal                2487
HPIMale                   2608
HPIFemale                 2607
HPITotal                  2561
HispanicMale              2481
HispanicFemale            2480
HispanicTotal             2480
TRMale                    2487
TRFemale                  2485
TRTotal                   2484
WhiteMale                 2481
WhiteFemale               2481
WhiteTotal                2480
Latitude                     0
Longitude                    0
dtype: int64
In [12]:
psChar_23["Status_Text"].unique() #check to see if the schools are operational 

psChar_23 = psChar_23[psChar_23["Status_Text"].str.contains(
    "School to be operational within two years|School temporarily closed", na=False) ==False]
In [34]:
psChar_23["SchoolType"].unique() #check to see the types of schools listed in the dataset, only looking at traditional schools so we can cut the others out

psChar_23 = psChar_23[psChar_23["SchoolType"].str.contains(
    "Regular School", na=False)]
The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
In [14]:
# filter out negative FRPL (free and reduced price lunch) values & student teacher ratios
negativeCols = ['ReducedLunch', 'MealProgramCertified','FreeLunch','StudentTeacherRatio']

psChar_23 = psChar_23[(psChar_23[negativeCols] >= 0).all(axis=1)]
In [15]:
psChar_23.shape
Out[15]:
(37392, 67)
In [16]:
psChar_23['Locale'].unique()
Out[16]:
array(['32-Town: Distant', '42-Rural: Distant', '41-Rural: Fringe',
       '13-City: Small', '21-Suburb: Large', '33-Town: Remote',
       '31-Town: Fringe', '23-Suburb: Small', '12-City: Mid-size',
       '43-Rural: Remote', '22-Suburb: Mid-size', '11-City: Large'],
      dtype=object)
In [17]:
Locale = {'42-Rural: Distant':'Rural',
            '41-Rural: Fringe':'Rural',
            '43-Rural: Remote':'Rural',
            '32-Town: Distant':'Town',
            '33-Town: Remote':'Town',
            '31-Town: Fringe':'Town',
            '13-City: Small':'City',
            '12-City: Mid-size':'City',
            '11-City: Large':'City',
            '21-Suburb: Large':'Suburb',
            '23-Suburb: Small':'Suburb',
            '22-Suburb: Mid-size':'Suburb'}

Locale
Out[17]:
{'42-Rural: Distant': 'Rural',
 '41-Rural: Fringe': 'Rural',
 '43-Rural: Remote': 'Rural',
 '32-Town: Distant': 'Town',
 '33-Town: Remote': 'Town',
 '31-Town: Fringe': 'Town',
 '13-City: Small': 'City',
 '12-City: Mid-size': 'City',
 '11-City: Large': 'City',
 '21-Suburb: Large': 'Suburb',
 '23-Suburb: Small': 'Suburb',
 '22-Suburb: Mid-size': 'Suburb'}
In [18]:
psChar_23['Locale'] = psChar_23['Locale'].map(Locale)

psChar_23['Locale'].unique()
Out[18]:
array(['Town', 'Rural', 'City', 'Suburb'], dtype=object)
In [19]:
psChar_23.describe()
Out[19]:
X Y ObjectID Zip Status TotalFreeLunch FreeLunch ReducedLunch MealProgramCertified PreK Kindergarten Grade1 Grade2 Grade3 Grade4 Grade5 Grade6 Grade7 Grade8 Grade9 Grade10 Grade11 Grade12 Ungraded TotMaleEnrollment TotFemaleEnrollment TotalEnrollment StudentTeacherRatio AIANMale AIANFem AIANTotal AsianMale AsianFemale AsianTotal BlackMale BlackFemale BlackTotal HPIMale HPIFemale HPITotal HispanicMale HispanicFemale HispanicTotal TRMale TRFemale TRTotal WhiteMale WhiteFemale WhiteTotal Latitude Longitude
count 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 37392.000000 11978.000000 22550.000000 22629.000000 22642.000000 22602.000000 22538.000000 22257.000000 14797.000000 11647.000000 11603.000000 8163.000000 8083.000000 8065.000000 8051.000000 1952.000000 36722.000000 36722.000000 37392.000000 37392.000000 36672.000000 36670.000000 36700.000000 36717.000000 36720.000000 36722.000000 36713.000000 36711.000000 36717.000000 36661.000000 36663.000000 36689.000000 36722.000000 36722.000000 36722.000000 36719.000000 36720.000000 36720.000000 36722.000000 36722.000000 36722.000000 37392.000000 37392.000000
mean -100.251468 37.290953 36287.616683 63446.899497 1.014549 329.526610 294.008478 35.518132 211.785756 32.782017 72.719335 71.254938 69.543636 71.746350 71.135815 72.845082 110.678448 141.237400 144.294062 223.755115 217.416924 200.690763 191.246429 5.592725 298.766843 283.438457 582.363982 17.143016 3.452471 3.328279 6.775395 18.924422 17.763154 36.684031 45.420178 43.891340 89.299398 1.804724 1.707362 3.509499 90.627880 86.666930 177.294810 16.256734 15.626416 31.882707 122.303170 114.477398 236.780568 37.290953 -100.251468
std 19.640040 6.016159 29092.351157 28541.959418 0.193091 302.519034 276.192414 57.319530 205.370496 38.554464 43.464381 41.309698 40.438018 41.754483 42.033706 46.943546 107.750815 131.743413 135.120098 228.483527 214.329761 199.781788 191.818897 9.279454 244.643853 236.573281 478.938331 13.327592 16.876568 16.257685 33.002197 51.673689 48.952495 100.345711 81.985733 80.888984 162.123805 10.492019 9.835723 20.222761 136.744539 131.278507 267.380097 19.221038 18.730197 37.461901 131.928343 126.522191 257.671934 6.016159 19.640040
min -171.715402 14.140873 1.000000 3901.000000 1.000000 3.000000 0.000000 0.000000 3.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9.000000 0.610000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 14.140873 -171.715402
25% -118.201758 33.753817 11736.500000 34249.000000 1.000000 136.000000 115.000000 5.000000 73.000000 10.000000 45.000000 45.000000 44.000000 45.000000 44.000000 44.000000 36.000000 34.000000 35.000000 44.000000 44.000000 42.000000 40.000000 0.000000 157.000000 148.000000 308.000000 13.490000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 2.000000 1.000000 3.000000 0.000000 0.000000 0.000000 11.000000 11.000000 22.000000 4.000000 3.000000 7.000000 26.000000 24.000000 50.000000 33.753816 -118.201758
50% -93.889011 36.964123 26215.500000 64014.000000 1.000000 261.000000 230.000000 20.000000 158.000000 24.000000 69.000000 68.000000 66.000000 68.000000 67.000000 68.000000 73.000000 95.000000 97.000000 135.000000 132.000000 120.000000 113.000000 2.000000 244.000000 230.000000 474.000000 16.140000 1.000000 0.000000 1.000000 3.000000 3.000000 6.000000 11.000000 10.000000 21.000000 0.000000 0.000000 0.000000 40.000000 38.000000 78.000000 11.000000 10.000000 22.000000 89.000000 82.000000 171.000000 36.964123 -93.889011
75% -84.473956 39.752610 51456.250000 92405.000000 1.000000 427.000000 387.000000 45.000000 285.000000 43.000000 95.000000 92.000000 90.000000 93.000000 92.000000 94.000000 150.000000 228.000000 232.000000 373.000000 361.000000 330.000000 310.000000 8.000000 358.000000 338.000000 694.000000 20.000000 2.000000 2.000000 3.000000 14.000000 13.000000 27.000000 54.000000 52.000000 106.000000 1.000000 1.000000 2.000000 118.000000 114.000000 233.000000 22.000000 21.000000 44.000000 172.000000 160.000000 332.000000 39.752610 -84.473956
max 145.784430 71.298478 100508.000000 99950.000000 8.000000 5770.000000 5563.000000 1400.000000 2921.000000 903.000000 873.000000 646.000000 665.000000 691.000000 669.000000 727.000000 923.000000 844.000000 930.000000 6251.000000 2855.000000 1293.000000 1339.000000 223.000000 4352.000000 4524.000000 8876.000000 1860.000000 585.000000 513.000000 1098.000000 1335.000000 1224.000000 2559.000000 2195.000000 2207.000000 4402.000000 556.000000 440.000000 996.000000 1947.000000 2118.000000 4065.000000 436.000000 422.000000 828.000000 1989.000000 2305.000000 4294.000000 71.298478 145.784430
In [20]:
psChar_23og = psChar_23

Observations of Descriptive Statistics¶

(Min, Max):

  • TotalFreeLunch (3, 5770); FreeLunch (0, 5563); ReducedLunch (0, 1400); MealProgramCertified (3, 2921)
  • PreK (0, 903)
  • Kindergarten (0, 873)
  • Grade1 (0, 646)
  • Grade2 (0, 665)
  • Grade3 (0, 691)
  • Grade4 (0, 669)
  • Grade5 (0, 727)
  • Grade6 (0, 923)
  • Grade7 (0, 844)
  • Grade8 (0, 930)
  • Grade9 (0, 6251)
  • Grade10 (0, 2855)
  • Grade11 (0, 1293)
  • Grade12 (0, 1339)
  • Ungraded (0, 223)
  • Total Male Enrollment (0, 4352); Total Female Enrollment (0, 4524); Total Enrollment (9, 8876)
  • Student Teacher Ratio (0, 1860)
  • American Indian/Alaskan Native Male (0, 585); American Indian/Alaskan Native Female (0, 513); American Indian/Alaskan Native Total (0, 1098)
  • Asian Male (0, 1335); Asian Female (0, 1224); Asian Total (0, 2559)
  • Black Male (0, 2195); Black Female (0, 2207); Black Total (0, 4402)
  • Native Hawaiian/Pacific Islander(HPI) Male (0, 556); Native Hawaiian/Pacific Islander(HPI) Female (0, 440); Native Hawaiian/Pacific Islander(HPI) Total (0, 996)
  • Hispanic Male (0, 1947); Hispanic Female (0, 2118); Hispanic Total (0, 4065)
  • Two or More Races Male (0, 436); Two or More Races Female (0, 422); Two or More Races Total (0, 828)
  • White Male (0, 1989); White Female (0, 2305); White Total (0, 4294)

Mean:

  • TotalFreeLunch: 329.53; FreeLunch: 294.01; ReducedLunch: 35.52; MealProgramCertified: 211.79
  • PreK: 32.78 students
  • Kindergarten: 72.72 students
  • Grade1: 71.25 students
  • Grade2: 69.54 students
  • Grade3: 71.75 students
  • Grade4: 71.14 students
  • Grade5: 72.85 students
  • Grade6: 110.68 students
  • Grade7: 141.24 students
  • Grade8: 144.29 students
  • Grade9: 223.76 students
  • Grade10: 217.42 students
  • Grade11: 200.69 students
  • Grade12: 191.25 students
  • Ungraded: 5.59 students
  • Total Male Enrollment: 298.77 students; Total Female Enrollment: 283.44 students; Total Enrollment: 582.36 students
  • Student Teacher Ratio: 17.14 students/teacher
  • American Indian/Alaskan Native Male: 3.45 students; American Indian/Alaskan Native Female: 3.33 students; American Indian/Alaskan Native Total: 6.78 students
  • Asian Male: 18.92 students; Asian Female 17.76 students; Asian Total: 36.68 students
  • Black Male: 45.42 students; Black Female: 43.89 students; Black Total: 89.30 students
  • Native Hawaiian/Pacific Islander(HPI) Male: 1.80 students; Native Hawaiian/Pacific Islander(HPI) Female: 1.71 students; Native Hawaiian/Pacific Islander(HPI) Total: 3.51 students
  • Hispanic Male: 90.63 students; Hispanic Female: 86.67 students; Hispanic Total: 177.29 students
  • Two or More Races Male: 16.26 students; Two or More Races Female: 15.63 students; Two or More Races Total: 31.82 students
  • White Male: 122.30 students; White Female: 114.48 students; White Total: 236.78 students

Quartile Ranges (25%, 75%):

  • TotalFreeLunch: (136, 427); FreeLunch: (115, 387); ReducedLunch (5, 45); MealProgramCertified: (73, 285)
  • PreK: (10, 43)
  • Kindergarten: (45, 95)
  • Grade1: (45, 92)
  • Grade2: (44, 90)
  • Grade3: (45, 93)
  • Grade4: (44, 92)
  • Grade5: (44, 94)
  • Grade6: (36, 150)
  • Grade7: (34, 228)
  • Grade8: (35, 232)
  • Grade9: (44, 373)
  • Grade10: (44, 361)
  • Grade11: (42, 330)
  • Grade12: (40, 310)
  • Ungraded: (2, 8)
  • Total Male Enrollment: (157, 358); Total Female Enrollment: (148, 338); Total Enrollment: (308, 694)
  • Student Teacher Ratio: (13.49, 20)
  • American Indian/Alaskan Native Male: (0, 2); American Indian/Alaskan Native Female: (0, 2); American Indian/Alaskan Native Total: (0, 3)
  • Asian Male: (0, 14); Asian Female: (0, 13); Asian Total: (1, 27)
  • Black Male: (1, 54); Black Female: (1, 52); Black Total: (3, 106)
  • Native Hawaiian/Pacific Islander(HPI) Male: (0, 1); Native Hawaiian/Pacific Islander(HPI) Female: (0, 1); Native Hawaiian/Pacific Islander(HPI) Total: (0, 2)
  • Hispanic Male: (11, 118); Hispanic Female: (11, 114); Hispanic Total: (22, 233)
  • Two or More Races Male: (4, 22); Two or More Races Female: (3, 21); Two or More Races Total: (7, 44)
  • White Male: (26, 172); White Female: (24, 160); White Total: (50, 332)

Standard Deviation:

Higher than mean- Reduced Lunch, PreK, Grade9, Grade12, Ungraded; all student races Lower- Total Free Lunch, Free Lunch, Meal Program Certified, all grades (except 9 and 12), Total Male Enrollment, Total Female Enrollment, Total Enrollment, Student to Teacher Ratio

FRPL rates- the std's are moderately lower than the means, excluding the std for ReducedLunch which is higher than the mean.

The standard deviations for PreK, and Grades 9 and 12, are higher than the means, while all other grades are lower.

The standard deviations for enrollment rates are lower than the means.

The standard deviation for the student to teacher ratio is lower than the mean.

The standard deviations for all student demographics are higher than the means, though the disparity found in White student demographics is much less significant compared to other races/ethnicities.

Mean/Median Closeness:

The medians for the free/reduced lunch status of the schools are lower than the means.

For the columns covering the elementary school grades, the medians are close but lower than the mean values. For the other grades, the medians are not as close, but are still lower than the means.

The median for the student-teacher ratio is close to the mean.

The median values for the Black and Hispanic student demographics are significantly lower than the mean values.

In [21]:
print(psChar_23["StudentTeacherRatio"].describe())
count    37392.000000
mean        17.143016
std         13.327592
min          0.610000
25%         13.490000
50%         16.140000
75%         20.000000
max       1860.000000
Name: StudentTeacherRatio, dtype: float64
In [22]:
psChar_23.columns
Out[22]:
Index(['X', 'Y', 'ObjectID', 'StateABR', 'SchoolName', 'Street1', 'Street2',
       'City', 'State', 'Zip', 'Zip4', 'Charter', 'Virtual', 'LowestGrade',
       'HighestGrade', 'SchoolLevel', 'Status', 'SchoolType', 'Status_Text',
       'Locale', 'County', 'TotalFreeLunch', 'FreeLunch', 'ReducedLunch',
       'MealProgramCertified', 'PreK', 'Kindergarten', 'Grade1', 'Grade2',
       'Grade3', 'Grade4', 'Grade5', 'Grade6', 'Grade7', 'Grade8', 'Grade9',
       'Grade10', 'Grade11', 'Grade12', 'Ungraded', 'TotMaleEnrollment',
       'TotFemaleEnrollment', 'TotalEnrollment', 'StudentTeacherRatio',
       'AIANMale', 'AIANFem', 'AIANTotal', 'AsianMale', 'AsianFemale',
       'AsianTotal', 'BlackMale', 'BlackFemale', 'BlackTotal', 'HPIMale',
       'HPIFemale', 'HPITotal', 'HispanicMale', 'HispanicFemale',
       'HispanicTotal', 'TRMale', 'TRFemale', 'TRTotal', 'WhiteMale',
       'WhiteFemale', 'WhiteTotal', 'Latitude', 'Longitude'],
      dtype='object')
In [24]:
print(type(psChar_23))
<class 'pandas.core.frame.DataFrame'>

Scatter Plots¶

In [25]:
import matplotlib.pyplot as plt
import numpy as np

psChar_23 = psChar_23[psChar_23["TotalFreeLunch"] <= psChar_23["TotalEnrollment"]]
psChar_23["LunchRate"] = (psChar_23["TotalFreeLunch"] / psChar_23["TotalEnrollment"]) * 100

race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 45, "WhiteTotal": 45} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')

ax.legend(('Predominately Black School', 'Predominately Latino/Hispanic School', 'Predominately White School'), loc='upper right', shadow=True)
ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (by Race)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Hispanic/Latino Demographic¶

In [31]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 45, "WhiteTotal": 0} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Hispanic Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Black Demographic¶

In [32]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 45, "HispanicTotal": 0, "WhiteTotal": 0} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (Black Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)   

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

White Demographic¶

In [33]:
race_colors = {"BlackTotal": "tab:blue", "HispanicTotal": "tab:orange", "WhiteTotal": "tab:green"}
size_values = {"BlackTotal": 0, "HispanicTotal": 0, "WhiteTotal": 45} 
psChar_23["PredominantRace"] = psChar_23[["BlackTotal", "HispanicTotal", "WhiteTotal"]].idxmax(axis=1)

fig, ax = plt.subplots()
for race, color in race_colors.items():
    subset = psChar_23[psChar_23["PredominantRace"] == race]  
    x = subset["LunchRate"]
    y = subset["StudentTeacherRatio"]
    scale = 200.0 * np.random.rand(len(subset))
    ax.scatter(x, y, c=color, s=size_values[race], label=race, alpha=0.3, edgecolors='none')


ax.grid(True)
ax.set_xlabel("% of Students w/ FRPL Eligibility")
ax.set_ylabel("Students per Teacher")
ax.set_title("FRPL Eligibility & Student-Teacher Ratio (White Students)")

ax.set_xlim(0, 100)  
ax.set_ylim(0, 50)  

ax.set_ymargin(0.1)   
ax.set_xmargin(0.1)

plt.show()
No description has been provided for this image

Bubble Plot¶

In [29]:
psChar_23.head(1)
Out[29]:
X Y ObjectID StateABR SchoolName Street1 Street2 City State Zip Zip4 Charter Virtual LowestGrade HighestGrade SchoolLevel Status SchoolType Status_Text Locale County TotalFreeLunch FreeLunch ReducedLunch MealProgramCertified PreK Kindergarten Grade1 Grade2 Grade3 Grade4 Grade5 Grade6 Grade7 Grade8 Grade9 Grade10 Grade11 Grade12 Ungraded TotMaleEnrollment TotFemaleEnrollment TotalEnrollment StudentTeacherRatio AIANMale AIANFem AIANTotal AsianMale AsianFemale AsianTotal BlackMale BlackFemale BlackTotal HPIMale HPIFemale HPITotal HispanicMale HispanicFemale HispanicTotal TRMale TRFemale TRTotal WhiteMale WhiteFemale WhiteTotal Latitude Longitude LunchRate PredominantRace
0 -86.2062 34.2602 1 AL Albertville Middle School 600 E Alabama Ave NaN Albertville AL 35950 No Not Virtual 07 08 Middle 1 Regular School Currently operational Town Marshall County 697 654 43 587 NaN NaN NaN NaN NaN NaN NaN NaN 440.0 450.0 NaN NaN NaN NaN NaN 459.0 431.0 890.0 19.78 4.0 1.0 5.0 4.0 2.0 6.0 15.0 14.0 29.0 0.0 1.0 1.0 251.0 251.0 502.0 17.0 15.0 32.0 168.0 147.0 315.0 34.2602 -86.2062 78.314607 HispanicTotal
In [30]:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import math

sample_size = 1000
sample = psChar_23.sample(n=sample_size, random_state=1)

hover_text = []
bubble_size = []

for index, row in sample.iterrows():
    hover_text.append(('School: {SchoolName}<br>'+
                      'Lunch Rate: {LunchRate:.2f}<br>'+
                      'Students per Teacher: {StudentTeacherRatio}<br>'+
                      'Total Enrollment: {TotalEnrollment}<br>').format(SchoolName=row['SchoolName'],
                                            LunchRate=row['LunchRate'],
                                            StudentTeacherRatio=row['StudentTeacherRatio'],
                                            TotalEnrollment=row['TotalEnrollment']))
    bubble_size.append(math.sqrt(row['TotalEnrollment']))

sample['text'] = hover_text
sample['size'] = bubble_size
sizeref = 2.*max(sample['size'])/(25**2)

race_categories = ['BlackTotal', 'HispanicTotal', 'WhiteTotal']
race_data = {race: sample[sample["PredominantRace"] == race] for race in race_categories}

fig = go.Figure()

for race, subset in race_data.items():
    fig.add_trace(go.Scatter(
        x=subset["LunchRate"],
        y=subset["StudentTeacherRatio"],
        name=race,
        text=subset["text"],
        marker_size=subset['size'],
        ))

fig.update_traces(mode='markers', marker=dict(sizemode='area',
                                              sizeref=sizeref, line_width=2))

fig.update_layout(
    title="FRPL Eligibility & Student-Teacher Ratio",
    xaxis=dict(title="% of Students w/ FRPL Eligibility", gridcolor='white', gridwidth=2),
    yaxis=dict(title="Students per Teacher", gridcolor='white', gridwidth=2, range=[0, 50],  
        dtick=20),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
)

fig.show()
In [ ]: